MEANTIME, the NewsReader Multilingual Event and Time Corpus
نویسندگان
چکیده
In this paper, we present the NewsReader MEANTIME corpus, a semantically annotated corpus of Wikinews articles. The corpus consists of 480 news articles, i.e. 120 English news articles and their translations in Spanish, Italian, and Dutch. MEANTIME contains annotations at different levels. The document-level annotation includes markables (e.g. entity mentions, event mentions, time expressions, and numerical expressions), relations between markables (modeling, for example, temporal information and semantic role labeling), and entity and event intra-document coreference. The corpus-level annotation includes entity and event cross-document coreference. Semantic annotation on the English section was performed manually; for the annotation in Italian, Spanish, and (partially) Dutch, a procedure was devised to automatically project the annotations on the English texts onto the translated texts, based on the manual alignment of the annotated elements; this enabled us not only to speed up the annotation process but also provided cross-lingual coreference. The English section of the corpus was extended with timeline annotations for the SemEval 2015 TimeLine shared task. The First CLIN Dutch Shared Task at CLIN26 was based on the Dutch section, while the EVALITA 2016 FactA (Event Factuality Annotation) shared task, based on the Italian section, is currently being organized.
منابع مشابه
Multilingual Event Detection using the NewsReader Pipelines
We describe a novel modular system for cross-lingual event extraction for English, Spanish,, Dutch and Italian texts. The system consists of a ready-to-use modular set of advanced multilingual Natural Language Processing (NLP) tools. The pipeline integrates modules for basic NLP processing as well as more advanced tasks such as cross-lingual Named Entity Linking, Semantic Role Labeling and time...
متن کاملTowards Multilingual Event Extraction Evaluation: A Case Study for the Czech Language
This paper presents a multilingual corpus of news, annotated with event metadata information. The events in our corpus are from the domain of violence, natural and man made disasters. The main goal of the corpus is automatic evaluation of event detection and extraction systems in different languages. As a use case, we take a rulebased event extraction system, extend it to cover a new language, ...
متن کاملExploring the Usefulness of Cross-lingual Information Fusion for Refining Real-time News Event Extraction: A Preliminary Study
Nowadays, many influential facts are reported multiple times by different sources and in different languages. This paper presents the results of an experiment on deploying cross-lingual information fusion techniques for refining the results of a large-scale multilingual news event extraction system. An evaluation on a test corpus consisting of 618 event descriptions which refer to 523 real-worl...
متن کاملThe Multilingual City: Vitality, Conflict, and Change, Edited by Lid King & Lorna Carson (2006), Multilingual Matters, ISBN-13 978-1-78309-477-6
متن کامل
Multilingual Corpora - Current Practice and Future Trends
In this paper I would like to give an overview of multilingual corpus building to date. In doing so, I will review two types of multilingual corpus, parallel and translation corpora. Following this, I will consider what tools are currently available which allow for the exploitation of such corpora in the context of machine/machine aided translation. Throughout I will give a fairly global view o...
متن کامل